Document Level Sentiment Analysis with Deep Learning Models
Contents
Document Level Sentiment Analysis with Deep Learning Models#
Load Twitter Datasets#
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[4], line 2
1 # Load TwEmLab Goldstandard
----> 2 tree1 = ET.parse('../Data/twemlab_goldstandards_original/birmingham_labels.xml')
3 root1 = tree1.getroot()
5 # check contents
6 #root1[0][1].text
7
8 # create dataframe from xml file
File D:\Users\Christina\Programmes\envs\GRACE_GPU\lib\xml\etree\ElementTree.py:1222, in parse(source, parser)
1213 """Parse XML document into element tree.
1214
1215 *source* is a filename or file object containing XML data,
(...)
1219
1220 """
1221 tree = ElementTree()
-> 1222 tree.parse(source, parser)
1223 return tree
File D:\Users\Christina\Programmes\envs\GRACE_GPU\lib\xml\etree\ElementTree.py:569, in ElementTree.parse(self, source, parser)
567 close_source = False
568 if not hasattr(source, "read"):
--> 569 source = open(source, "rb")
570 close_source = True
571 try:
FileNotFoundError: [Errno 2] No such file or directory: '../Data/twemlab_goldstandards_original/birmingham_labels.xml'
| Text | Label | sentiment_label | |
|---|---|---|---|
| 8 | The unnamed woman, in her 30s, had been on the Twister in Sheldon Country park in Birmingham when the tragedy struck just before 5pm | sadness | -1 |
| 196 | Sunday spent with this one #love #Sunday #weekend #boyfriend @ Sutton Park | happiness | 1 |
| 611 | #wearingblackformaroua #team237 @ Sheldon Country Park | none | 0 |
SemEval Goldstandard dataset size: 3713
Polarity: (1 pos, 0 neu, -1 neg)
Sample of the data:
| Text | Polarity | sentiment | |
|---|---|---|---|
| 1756 | Service and food is what any one would expect when spending that type of money. | neutral | 0 |
| 5 | For those that go once and don't enjoy it, all I can say is that they just don't get it. | positive | 1 |
| 498 | We generally like good restaurants and eat out often but Kai was way to expensive for what we got. | negative | -1 |
AIFER example dataset size: 11177
Sample of the data:
| Date | Text | Language | |
|---|---|---|---|
| 4565 | 2021-07-16 10:37:31 | @Larapic Thanks Lara! We did lose a bunch of stuff worth several thousands though the repair of the building will probably cost a lot more. And the cleaning will be a huge hassle. 😬 But as long as my family is fine, we'll be ok. 💛 | en |
| 1810 | 2021-07-07 08:55:02 | @Karolin63676283 @Aynqa @AsiaAda992 @AKrysztofinska @anika1_2 @BeataGdula @warto_rozrabiac @paulbrzez @Katarzy56700744 @wichniarek18 @Gymshark @AldonaMarciniak @iga_swiatek @ciwhiskey @BorekMati @Polsport @Paolcia_ Musi ✌🏻 a co u ciebie ? | pl |
| 1518 | 2021-07-06 11:59:38 | @Lilly27mia Hat es schon, danke | de |
Transformers-based NLP Models#
Why Transformers?#
In recent years, the transformer model has revolutionized the field of NLP. This ‘new’ deep learning approach has been highly successful in a variety NLP tasks, including sentiment analysis. The transformer model offers several advantages over traditional machine learning and even other deep learning approaches and have been shown to outperform traditional machine learning and other deep learning methods on NLP tasks, particularly sentiment analysis. Some of the key advantages it has are:
The encoder-decoder framework: Encoder generates a representation of the input (semantic, context, positional) and the decoder generates output. Common use case: sequence to sequence translation tasks.
Attention mechanisms: Deals with the information bottleneck of the traditional encoder-decoder architecture (where one final encoder hidden state is passed to decoder) by allowing the decoder to access the hidden states at each step and being able to prioritise which state is most relevant.
Transfer learning (i.e. fine-tuning a pre-trained language model)

A note on Attention#
In transformers, multi-head scaled-dot product attention is usually used. This attention mechanism allows the Transformer to capture global dependencies between different positions in the input sequence, and to weigh the importance of different parts of the input when making predictions.
In scaled dot-product attention a dot product between the query, key, and value vectors is computed for each position in the sequence. The attention mechanism is repeated multiple times with different linear projections (hence “multi-head”) to capture different representations of the input.
Code implementation
class AttentionHead(nn.Module):
def __init__(self, embed_dim, head_dim):
super().__init__()
self.q = nn.Linear(embed_dim, head_dim)
self.k = nn.Linear(embed_dim, head_dim)
self.v = nn.Linear(embed_dim, head_dim)
def forward(self, hidden_state):
attn_outputs = scaled_dot_product_attention(
self.q(hidden_state), self.k(hidden_state), self.v(hidden_state))
return attn_outputs
class MultiHeadAttention(nn.Module):
def __init__(self, config):
super().__init__()
embed_dim = config.hidden_size
num_heads = config.num_attention_heads
head_dim = embed_dim // num_heads
self.heads = nn.ModuleList(
[AttentionHead(embed_dim, head_dim) for _ in range(num_heads)]
)
self.output_linear = nn.Linear(embed_dim, embed_dim)
def forward(self, hidden_state):
x = torch.cat([h(hidden_state) for h in self.heads], dim=-1)
x = self.output_linear(x)
return x
Here’s a visual representation of the attention machism at work with a demo text “The hurricane trashed our entire garden”:
from IPython.display import display, HTML
display(HTML('https://raw.githubusercontent.com/Christina1281995/demo-repo/main/neuron_view.html'))